2SNP: scalable phasing based on 2-SNP haplotypes

نویسندگان

  • Dumitru Brinza
  • Alex Zelikovsky
چکیده

2SNP software package implements a new very fast scalable algorithm for haplotype inference based on genotype statistics collected only for pairs of SNPs. This software can be used for comparatively accurate phasing of large number of long genome sequences, e.g. obtained from DNA arrays. As an input 2SNP takes genotype matrix and outputs the corresponding haplotype matrix. On datasets across 79 regions from HapMap 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example, 2SNP requires 41 s on Pentium 4 2 Ghz processor to phase 30 genotypes with 1381 SNPs (ENm010.7p15:2 data from HapMap) versus GERBIL and PHASE requiring more than a week and admitting no less errors than 2SNP.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WinHAP: An Efficient Haplotype Phasing Algorithm Based on Scalable Sliding Windows

Haplotype phasing represents an essential step in studying the association of genomic polymorphisms with complex genetic diseases, and in determining targets for drug designing. In recent years, huge amounts of genotype data are produced from the rapidly evolving high-throughput sequencing technologies, and the data volume challenges the community with more efficient haplotype phasing algorithm...

متن کامل

Phasing of 2-SNP Genotypes Based on Non-random Mating Model

Emerging microarray technologies allow genotyping of long genome sequences resulting in huge amount of data. A key challenge is to provide an accurate phasing of very long single nucleotide polymorphism (SNP) sequences. In this paper we explore phasing of genotypes with 2 SNPs adjusted to the non-random mating model and then apply it to the haplotype inference of complete genotypes using maximu...

متن کامل

ILP Methods for Family Trio Phasing

In population genotyping, it is common to genotype family trios consisting of the two parents and their child since that allows to recover haplotypes with higher confidence. Interestingly, the available software tools are primarily intended to phase only unrelated genotypes. In this section we first formulate the problem and describe specificity of family trio phasing and then analyze existing ...

متن کامل

Phasing and Missing Data Recovery in Family Trios

Although there exist many phasing methods for unrelated adults or pedigrees, phasing and missing data recovery for data representing family trios is lagging behind. This paper is an attempt to fill this gap by considering the following problem. Given a set of genotypes partitioned into family trios, find for each trio a quartet of parent haplotypes which agree with all three genotypes and recov...

متن کامل

The linkage method: a novel approach for SNP detection and haplotype reconstruction from a single diploid individual using next-generation sequence data.

When we sequence a diploid individual, the output actually comprises two genomes: one from the paternal parent and the other from the maternal parent. In this study, we introduce a novel heuristic algorithm for distinguishing single-nucleotide polymorphisms (SNPs) from the two parents and phasing them into haplotypes. The algorithm is unique because it simultaneously performs SNP calling and ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 22 3  شماره 

صفحات  -

تاریخ انتشار 2006